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A simple  proof  of  the  asymptotic  property  of  Chi-square  tests, 
commonly  used  in  the  analysis  of  categorical  data,  is  given  for  use  as 
a note  for  instruction  to  first-year  graduate  students. 
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1.  Introduction 

The  Chi-square  tests  associated  with  the  multinomial  distribution 
are  commonly  used  in  the  analysis  of  categorical  data  with  reference  to 
problems  of  specification,  homogeneity  of  parallel  samples,  independence 
of  attributes,  etc.  The  asymptotic  property  of  the  tests,  that  is,  the 
Chi-square  distribution  of  the  test  statistics  in  large  samples  is  generally 
known.  However,  it  has  been  our  observation  that  many  applied  statis- 
ticians tacitly  accept  the  asymptotic  result  without  satisfying  themselves 
with  its  proof.  It  is  also  true  that  nearly  all  the  text  books  in  use  on 
elementary  and  higher  statistics  either  omit  the  proof  or  barely  sketch 
it.  In  this  paper  we  outline  a fairly  simple  proof  of  the  fundamental 
result,  for  use  as  a note  for  the  instruction  to  first-year  graduate 
students  and  as  a needed  theory  for  the  frequent  application  of  Chi-square 
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tests  by  applied  statisticians.  The  proof  is  essentially  based  on 
the  contents  of  Chapters  5 and  6 of  Rao  (1966) . 

Consider  a multinomial  distribution  M(n,p)  with  K cells, 
where  p = (p.,...,p  )'  denotes  the  vector  of  cell  probabilities, 

I 

n = (n^,...,nK>  denotes  the  vector  of  cell  frequencies  resulting  from 
n independent  trials,  p^  = 1 an^  -i-1  ni  “ n*  A general  problem 

in  judging  goodness  of  fit  is  to  test  whether  the  cell  probabilities 
are  specified  functions  of  a fewer  number  of  parameters  whose  values  may 
be  unknown.  Let  the  cell  probabilities  be  given  functions  (6) , . . . ,pK  (9) 

I 

of  an  unknown  vector  9 = (8^,...,e  ),  where  r < K.  To  test  the  speci- 

fication, it  is  a standard  method  to  use  the  statistic 

(1.1)  T =1^  (ni-npi(d)  )2  / npi(9) 


where  9 is  a consistent  estimator  of  9,  usually  the  maximum  likelihood 
estimator.  Next,  assuming  that  the  specification  is  true,  consider  the 
hypothesis  that  9 is  given  by  9^  = gMa),  where  g^,...,g  are  given 

I 

functions,  a = (a^,...,a  ) and  s < r.  This  hypothesis  arises  in  a test 
of  homogeneity  of  parallel  samples  and  of  independence  in  a contingency 
table.  The  statistic 


(1.2)  T*  = n Ii^1(pi(9)  - p..(a)  ) 2/p..(a) 

is  used  to  test  the  hypothesis,  where  a denotes  an  estimate  of  a and 
p^(a)  denotes  the  value  of  p^(9)  as  a function  of  a,  under  the  given 
hypothesis.  It  is  shown  below  that  T and  T*  are  asymptotically  distrib- 
uted for  large  n according  to  the  Chi-square  distribution  under  certain 
conditions.  i Dv 
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2.  Asymptotic  Distribution  of  T and  T* 


First,  consider  the  specification  that  the  multinomial  cell  prob- 
abilities are  given  functions  p^ ( 0 ) , . . . ,pK ( 9 ) of  an  unknown  vector 
a = )',  where  r < K.  Let  9°  denote  the  true  value  of  9 . We 

make  the  following  assumptions: 

The  functions  p^(9)  are  continuous  in  9,  admitting  first  order 

partial  derivatives  which  are  continuous  at  9°. 

Given  6 > 0,  there  exists  e >0  such  that  inf  N(9)  > e, 

| 9-0° | >5 

N(0)  = Ii=i  Pi<9°)  log  (pi(0°)/pi(6) ) . 

M(9)  = Ii=l  lo9  (Pi (9°) /PA (9) ) . 


(i) 

(ii) 

where 

Let 


Consider  the  function  N(0)  on  the  sphere  |9-9°|  = 5.  Since  N(9)  is  contin- 
uous in  9,  the  infimum  of  N(9)  is  attained  on  the  sphere.  Therefore,  in 

ni 

view  of  (ii)  , N(9)  e for  every  point  on  the  sphere.  Since  — converges 

in  probability  to  p^(0°)  as  n ® , it  follows  that  M(0)  : 0 for  all  points 
on  the  sphere  with  probability  approaching  1 as  n . 


The  log  likelihood  function  is  proportional  to  10<?  p.(9)* 

In  view  of  (i)  and  the  result  given  above,  we  have  that  for  sufficiently 
large  n,  the  likelihood  function  has  a local  maximum  inside  the  open 
sphere  | 9—9  | <6  at  a point  9,  say,  which  is  a solution  of  the  likelihood 
equation 


(2.1) 


ni  3pi(9) 
p~(0)  39 j 


0#  j 


1# 


r. 


Since  6 can  be  made  arbitrarily  small,  the  maximum  likelihood  estimator 
a is  a consistent  solution  of  the  likelihood  equation. 
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Let  1(6)  = (M  (0))  (M ( 0 ) ) denote  the  information  matrix  of  the 
multinomial  distribution,  where 


M ( 8 ) = ( 


3pi ( 0 ) 


/P:  (*) 


) 


is  a K x r matrix,  and  let  Z = M V,  where  M = M(9J)  and 


, n^np^e0)  nK-npK(e0)  ). 

V = ( ) i • • • , ■ 


r 


/ np^ ( 8 ) 


A 


np^  9 } 


By  the  central  limit  theorem  , the  asymptotic  distribution  of  V is  multi- 
variate normal  N(0,i  — 4>4>  ),  where  0 denotes  the  null  vector,  I denotes 
the  identity  matrix  and  — t~jx  , 3\~)  • The  asymptotic 


, • • • , * Pj^  1 8 3 ) 


distribution  of  Z is  N(Q,I),  where  I = 1(9°).  Mote  that  m' $ = 0. 


Substituting  9 for  9 in  (2.1),  the  jth  equation  can  be  written  as 


K 


(2.2) 


n . - np  . ( 9 ) 3p  . ( 9 ) 

i i i _ r ^ 

M=1 

39  . X 

3 


K * n (p . ( 9 ) -p . (9  °) ) 3p.  (9) 

rv  L I.  L 


* 1 *'np^(0) 


Pi(<3) 


39j 


In  view  of  (i)  we  have 
(2.3) 


« uu  y w 

r . 3p.  (0°) 

p.  (3)-p.  (8°)  (9  ,-0  ,)  — — + n I e- 

1 1 “i-l  t l 30° 


where  n ■*  0 as  0 9°  . Since  9 is  a consistent  estimator  of  9 , as  shown 

above,  the  left  side  of  (2.2)  is  asymptotically  equivalent  (converging 
in  probability)  to  Zy  Therefore,  by  the  substitution  of  (2.3)  on  the 
right  side  of  (2.2)  we  have  that 


a T r . »n(9  -O  I 


z . V 
3 - 1 


i * 1 
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where  * means  "asymptotically  equivalent  to”,  and  I . denotes  the  £jth 

- i J 

element  of  I.  Hence 

Z ? /n  I (9  - e°) 

or 

(2.4)  j~  z * /n  (9  - 8°  ) 

where  I denotes  a generalized  inverse  of  1,  given  by  1 1 1=1,  and  is 

equal  to  I-1  if  I is  non-singular. 

j 

Let  A be  a symmetric  matrix  with  real  elements,  and  let  X , N (u,£), 

j 

where  „ means  "distributed  as".  If  the  covariance  matrix  z is  non-singular 
then  it  is  known  that  the  quadratic  form  x'  A X f . (non-central  Chi- 

V ,5 

square  with  v degrees  of  freedom  and  non-centrality  parameter  s ) if  and 
only  if  A 2 is  idempotent,  where  v = Rank  A and  5 = jp'Au.  If  I is 
singular,  then  the  given  condition  is  only  sufficient  and  v = Rank  Az 
(see  e.g.  Graybill  (1976),  Theorem  4.7.1).  If  A = z is 
a generalized  inverse  of  z,  given  by  Z Z Z=  Z , then  A I is  idempotent 
and  Rank  A z = Rank  £.  Therefore,  X1  z X . x . • where  v = Rank  z 

V , 0 

- r 1 • - 
and  5 = 2 u I u • 

Let  W = (W^,...,WR)  where 

VT  = /n  (p^(0)  - (0°))/  /pTTPT  , i = 1 , . . . ,K. 

From  (2.3)  we  have  that 

W * /n  M (0  - 0°) 

- M i”  Z by  (2.4) 

= M l'  m'  V. 


(2.5) 
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6. 

Note  that  Ml  m'  is  idempotent.  From  (1.1)  and  (2.5)  we  have 
T * (V  - W)  ' (V  - W) 

f V ' (I  - M l"  M ')  V. 

- 

Now,  V is  asymptotically  distributed  as  N(0,  I - $$'),  (I  -Mi_m')  (1-$$') 

= I - MI  M ' — ^ 4>  ' is  idemptotent  and 

Rank  ( I-MI  m'-$$  ) = Trace  (I-MI~M  ') 

= K - 1 - Rank  I = 3, say. 

2 

Therefore,  T is  asymptotically  distributed  as  \ , . If  I is  of  full  rank  then 

t3 

3 = K - r - 1. 

Next,  consider  the  hypothesis  that  e is  given  by  = g ^ (a), 
i = l,...,r,  as  described  in  the  previous  section.  Let  7(a)  = ( 36^/3cij ) 
denote  the  r x s matrix  of  the  derivatives,  and  let  I* (a)  denote  the  infor- 
mation matrix  under  the  given  hypothesis.  Then 
(2.6)  I* (a)  = (7(a) ) ' I ( 0) 7(a) 

with  I (3)  being  expressed  as  a function  of  a.  Let  a0  denote  the  true 
value  of  a . Similarly,  as  in  the  preceding  we  have  that 
T*  J v'  (MI'm'-MV(7  ' I7)_7  'm')V 

**  I 

where  7 = 7(a°).  The  matrix  (M I~M ' -M7 ( 7 ' 17) ~7 'm  ' ( ( I- 4>J>' ) = (M I m'- 
M7  ( 7 ' 1 7 ) 7 ' m ' ) is  idempotent  and 

Rank  (M  l“.M  '-M7  (7  ' 17)  “ V 'm  ')  = Rank  I -Rank  (7*17) 

= y,  say.  j 

4 

l f ..'-1 


1 


Therefore,  T*  is  asyirptotically  distributed  as  . If  I and  V are  of  full  rank 
then  y * r-s. 

We  have  shown  that  T and  T*  are  asyirptotically  distributed  according  to  the 
Chi-square  distribution  under  the  identifiability  condition  (i)  and  the  continuity 
assumption  (ii) . 

Remark:  For  the  goodness  of  fit  test  where  the  cell  probabilities  are  com- 
pletely specified  we  have  3 = K-l.  in  this  case  T ? v'v  . asyirptotically. 

For  testing  homogeneity  of  r parallel  samples  or  independence  of  attributes  in  r x K 
contigency  tables,  we  have  y = (r-1)  (K-l). 
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